-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix concurrent search and index delete #42621
Fix concurrent search and index delete #42621
Conversation
Changed order of listener invocation so that we notify before registering search context and notify after unregistering same. This ensures that count up/down like what we do in ShardSearchStats works. Otherwise, we risk notifying onFreeScrollContext before notifying onNewScrollContext (same for onFreeContext/onNewContext, but we currently have no assertions failing in those).
Pinging @elastic/es-search |
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following what the issue is here and why we don't clean it up correctly today? Can you show me what fails here in what situation. Sorry for being slow...
if (request.scroll() != null) { | ||
openScrollContexts.incrementAndGet(); | ||
context.indexShard().getSearchOperationListener().onNewScrollContext(context); | ||
} | ||
context.indexShard().getSearchOperationListener().onNewContext(context); | ||
putContext(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you moved the putContext(context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By registering the search context in activeContexts
after having invoked onNewContext/onNewScrollContext
, we guarantee that for a specific SearchContext
, the call to onNewXXX
happens before the matching call the onFreeXXX
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see I didn't think about SearchService.afterIndexRemoved
. my issue here is that we have to call onFreeXXX
but if one of the onNewContext /onNewScrollContext
fails we don't register and fail? I think we need extra protection for this?
// ensure that if index is deleted concurrently, we free the context immediately, either here or in afterIndexRemoved | ||
try { | ||
indicesService.indexServiceSafe(request.shardId().getIndex()); | ||
} catch (IndexNotFoundException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the catch clause here - we free below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did I miss that, thanks! Fixed in 8edda7b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but is this necessary or can we just rearrange the registration?
Thanks for reviewing, @s1monw . The problem we were facing is that The wrong order of invocation could occur in following scenario:
|
boolean success = false; | ||
try { | ||
putContext(context); | ||
if (request.scroll() != null) { | ||
// ensure that if index is deleted concurrently, we free the context immediately, either here or in freeAllContextForIndex | ||
indicesService.indexServiceSafe(request.shardId().getIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not sure why we do this here. Why can't we just let the search go and fail whenever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be overkill... The way I understand the code, the primary purpose of afterIndexRemoved
is to ensure that we free memory and file system resources as quickly as possible when an index is deleted/closed. Here we try to ensure the same happens also during a race condition. Without this, we theoretically risk holding on to memory/file system resources for a while after the index is deleted/closed. With this, we should free such resources as soon as the index is deleted (and any current search phases have run to completion).
I guess there is a trade-off in that doing this double validation has a small cost and the likelihood of seeing the race condition is very small. I can certainly remove this double validation if you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove the double validation. But the fix you added in this PR is crucial!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @s1monw , I have removed the double validation, please have another look at your convenience.
Removed double check for index delete to avoid the performance overhead. Removed validation of store ref count, since IndexService periodically does a shard refresh, which disturbs this validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM left one comment
return true; | ||
} | ||
return false; | ||
} | ||
} | ||
|
||
private void onFreeContext(SearchContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an assertion here that ensures that we removed the context. I think it's trappy that we have an this method and we need to make sure that we never call if if we haven't removed the context. The special case is fina
Changed order of listener invocation so that we notify before registering search context and notify after unregistering same. This ensures that count up/down like what we do in ShardSearchStats works. Otherwise, we risk notifying onFreeScrollContext before notifying onNewScrollContext (same for onFreeContext/onNewContext, but we currently have no assertions failing in those). Closes #28053
Changed order of listener invocation so that we notify before registering search context and notify after unregistering same. This ensures that count up/down like what we do in ShardSearchStats works. Otherwise, we risk notifying onFreeScrollContext before notifying onNewScrollContext (same for onFreeContext/onNewContext, but we currently have no assertions failing in those). Closes #28053
All backports done. |
Changed order of listener invocation so that we notify before registering search context and notify after unregistering same. This ensures that count up/down like what we do in ShardSearchStats works. Otherwise, we risk notifying onFreeScrollContext before notifying onNewScrollContext (same for onFreeContext/onNewContext, but we currently have no assertions failing in those). Closes elastic#28053
Changed order of listener invocation so that we notify before registering search context and notify after unregistering same. This ensures that count up/down like what we do in ShardSearchStats works. Otherwise, we risk notifying onFreeScrollContext before notifying onNewScrollContext (same for onFreeContext/onNewContext, but we currently have no assertions failing in those). Closes elastic#28053
Changed order of listener invocation so that we notify before registering search context and notify after unregistering same. This ensures that count up/down like what we do in ShardSearchStats works. Otherwise, we risk notifying onFreeScrollContext before notifying onNewScrollContext (same for onFreeContext/onNewContext, but we currently have no assertions failing in those). Closes #28053
…lls below the low watermark. Relates to elastic#39334 Tracking indicesToMarkIneligibleForAutoRelease instead of a Map and addressing other minor comments Unmute FullClusterRestartIT#testClosedIndices Fixed in #39566 Closes #39576 Add debug log for retention leases (#42557) We need more information to understand why CcrRetentionLeaseIT is failing. This commit adds some debug log to retention leases and enables them in CcrRetentionLeaseIT. Improve how internal representation of pipelines are updated (#42257) If a single pipeline is updated then the internal representation of all pipelines was updated. With this change, only the internal representation of the pipelines that have been modified will be updated. Prior to this change the IngestMetadata of the previous and current cluster was used to determine whether the internal representation of pipelines should be updated. If applying the previous cluster state change failed then subsequent cluster state changes that have no changes to IngestMetadata will not attempt to update the internal representation of the pipelines. This commit, changes how the IngestService updates the internal representation by keeping track of the underlying configuration and use that to detect against the new IngestMetadata whether a pipeline configuration has been changed and if so, then the internal pipeline representation will be updated. Fix RareClusterStateIT (#42430) * It looks like we might be cancelling a previous publication instead of the one triggered by the given request with a very low likelihood. * Fixed by adding a wait for no in-progress publications * Also added debug logging that would've identified this problem * Closes #36813 Update script-fields.asciidoc (#42490) Fixed typo in docker.asciidoc (#42455) Remove unused mapStringsOrdered method (#42513) Remove unused mapStringsOrdered method Dry up BlobStoreRepository#basePath Implementations (#42578) * This method is just a getter in every implementation => moved the field and concrete getter to the base class to simplify implementations Add Infrastructure to Run 3rd Party Repository Tests (#42586) * Add Infrastructure to Run 3rd Party Repository Tests * Add infrastructure to run third party repository tests using our standard JUnit infrastructure * This is a prerequisite of #42189 Add test ensure we can execute update requests in mixed cluster Relates #42596 Allocate to data-only nodes in ReopenWhileClosingIT (#42560) If all primary shards are allocated on the master node, then the verifying before close step will never interact with mock transport service. This change prefers to allocate shards on data-only nodes. Closes #39757 Reset mock transport service in CcrRetentionLeaseIT (#42600) testRetentionLeaseIsAddedIfItDisappearsWhileFollowing does not reset the mock transport service after test. Surviving transport interceptors from that test can sneaky remove retention leases and make other tests fail. Closes #39331 Closes #39509 Closes #41428 Closes #41679 Closes #41737 Closes #41756 Fixed ignoring name parameter for percolator queries (#42598) Closes #40405 [Ml Data Frame] Return bad_request on preview when config is invalid (#42447) Mute AsyncTwoPhaseIndexerTests#testStateMachine() (#42609) Relates #42084 [ML DataFrame] Use date histogram fixed_interval syntax and remove test skip Mute NodeTests (#42614) Relates #42577 Fix Incorrect Time Math in MockTransport (#42595) * Fix Incorrect Time Math in MockTransport * The timeunit here must be nanos for the current time (we even convert it accordingly in the logging) * Also, changed the log message when dumping stack traces a little to make it easier to grep for (otherwise it's the same as the message on unregister) Remove PRE_60_NODE_CHECKPOINT (#42527) This commit removes the obsolete `PRE_60_NODE_CHECKPOINT` constant for dealing with 5.x nodes' lack of sequence number support. Backported as #42531 Reset state recovery after successful recovery (#42576) The problem this commit addresses is that state recovery is not reset on a node that then becomes master with a cluster state that has a state not recovered flag in it. The situation that was observed in a failed test run of MinimumMasterNodesIT.testThreeNodesNoMasterBlock (see below) is that we have 3 master nodes (node_t0, node_t1, node_t2), two of them are shut down (node_t2 remains), when the first one comes back (renamed to node_t4) it becomes leader in term 2 and sends state (with state_not_recovered_block) to node_t2, which accepts. node_t2 becomes leader in term 3, and as it was previously leader in term1 and successfully completed state recovery, does never retry state recovery in term 3. Closes #39172 [DOCS] Escape cross-ref link comma for Asciidoctor (#42402) [DOCS] Fix API Quick Reference rollup attribute for Asciidoctor (#42403) [ML] adding delayed_data_check_config to datafeed update docs (#42095) * [ML] adding delayed_data_check_config to datafeed update docs * [DOCS] Edits delayed data configuration details Avoid loading retention leases while writing them (#42620) Resolves #41430. Validate routing commands using updated routing state (#42066) When multiple commands are called in sequence, fetch shards from mutable, up-to-date routing nodes to ensure each command's changes are visible to subsequent commands. This addresses an issue uncovered during work on #41050. remove 6.4.x version constants (#42127) relates refactoring initiative #41164. [ML Data Frame] Set DF task state when stopping (#42516) Set the state to stopped prior to persisting [DOCS] Reorg monitoring configuration for re-use (#42547) Remove suppresions for "unchecked" for hamcrest varargs methods (#41528) In hamcrest 2.1 warnings for unchecked varargs were fixed by hamcrest using @SafeVarargs for those matchers where this warning occurred. This PR is aimed to remove these annotations when Matchers.contains ,Matchers.containsInAnyOrder or Matchers.hasItems was used Remove support for chained multi-fields. (#42333) Follow-up to #41926, where we deprecated support for multi-fields within multi-fields. Addresses #41267. Lazily compute Java 8 home in reindex configuration (#42630) In the reindex from old tests we require Java 8. Today when configuring the reindex from old tests, we eagerly evalulate Java 8 home, which means that we require JAVA8_HOME to be set even if the reindex from old test tasks are not in the task graph. This is an onerous requirement if, for example, all that you want to do is build a distribution. This commit addresses this by making evaluation of Java 8 home lazy, so that it is only done and required if the reindex from old test tasks would be executed. Remove "nodes/0" folder prefix from data path (#42489) With the removal of node.max_local_storage_nodes, there is no need anymore to keep the data in subfolders indexed by a node ordinal. This commit makes it so that ES 8.0 will store data directly in $DATA_DIR instead of $DATA_DIR/nodes/$nodeOrdinal. Upon startup, Elasticsearch will check to see if there is data in the old location, and automatically move it to the new location. This automatic migration only works if $nodeOrdinal is 0, i.e., multiple node instances have not previously run on the same data path, which required for node.max_local_storage_nodes to explicitly be configured. [DOCS] Set explicit anchors for Asciidoctor (#42521) unmute 'Test url escaping with url mustache function' and bump logging (#42400) check position before and after latch (#42623) check position before and after latch [DOCS] Fix X-Pack tag for Asciidoctor (#42443) fix javadoc of SearchRequestBuilder#setTrackTotalHits (#42219) [ML Data Frame] Mute stop start test Relates to https://github.com/elastic/elasticsearch/issues/42650 Add 7.1.2 version constant. (#42643) Relates to #42635 Adjust use of Deprecated Netty API (#42613) * With the recent upgrade to Netty 4.1.36 this method became deprecated and I made the advised change to fix the deprecation Fix a callout in the field alias docs. Add explicit build flag for experimenting with test execution cacheability (#42649) * Add build flag for ignoring random test seed as task input * Fix checkstyle violations Use correct global checkpoint sync interval (#42642) A disruption test case need to use a lower checkpoint sync interval since they verify sequence numbers after the test waiting max 10 seconds for it to stabilize. Closes #42637 Removes types from SearchRequest and QueryShardContext (#42112) [ML-DataFrame] rewrite start and stop to answer with acknowledged (#42589) rewrite start and stop to answer with acknowledged fixes #42450 Added param ignore_throttled=false when indicesOptions.ignoreThrottled() is false (#42393) and fixed test RequestConvertersTests and added ignore_throttled on all request [DOCS] Set explicit anchors for TLS/SSL settings (#42524) Testclusters: convert ccr tests (#42313) un-mute ActivateWatchTests, bump up logging, and remove explicit sleeps (#42396) un-mute Watcher rolling upgrade tests and bump up logging (#42377) Fixes watcher test to remove typed api call Muting WatcherRestIT webhook url escaping test See #41172 [DOCS] Adds more monitoring tagged regions Add warning scores are floats (#42667) Allow aggregations using expressions to use _score (#42652) _score was removed from use in aggregations using expressions unintentionally when script contexts were added. This allows _score to once again be used. Refactor HLRC RequestConverters parameters to be more explicit (#42128) The existing `RequestConverters.Params` is confusing, because it wraps an underlying request object and mutations of the `Params` object actually mutate the `Request` that was used in the construction of the `Params`. This leads to a situation where we create a `RequestConverter.Params` object, mutate it, and then it appears nothing happens to it - it appears to be unused. What happens behind the scenes is that the Request object is mutated when methods on `Params` are invoked. This results in unclear, confusing code where mutating one object changes another with no obvious connection. This commit refactors `RequestConverters.Params` to be a simple helper class to produce a `Map` which must be passed explicitly to a Request object. This makes it apparent that the `Params` are actually used, and that they have an effect on the `request` object explicit and easier to understand. Co-authored-by: Ojas Gulati <[email protected]> Propogate version in reindex from remote search (#42412) This is related to #31908. In order to use the external version in a reindex from remote request, the search request must be configured to request the version (as it is not returned by default). This commit modifies the search request to request the version. Additionally, it modifies our current reindex from remote tests to randomly use the external version_type. Fix inverted condition so we never cache rest integ tests Remove unused import Geo: Refactor libs/geo parsers (#42549) Refactors the WKT and GeoJSON parsers from an utility class into an instantiatable objects. This is a preliminary step in preparation for moving out coordinate validators from Geometry constructors. This should allow us to make validators plugable. Detect when security index is closed (#42191) If the security index is closed, it should be treated as unavailable for security purposes. Prior to 8.0 (or in a mixed cluster) a closed security index has no routing data, which would cause a NPE in the cluster change handler, and the index state would not be updated correctly. This commit fixese that problem Fix testTokenExpiry flaky test (#42585) Test was using ClockMock#rewind passing the amount of nanoseconds in order to "strip" nanos from the time value. This was intentional as the expiration time of the UserToken doesn't have nanosecond precision. However, ClockMock#rewind doesn't support nanos either, so when it's called with a TimeValue, it rewinds the clock by the TimeValue's millis instead. This was causing the clock to go enough millis before token expiration time and the test was passing. Once every few hundred times though, the TimeValue by which we attempted to rewind the clock only had nanos and no millis, so rewind moved the clock back just a few millis, but still after expiration time. This change moves the clock explicitly to the same instant as expiration, using clock.setTime and disregarding nanos. Revert "un-mute Watcher rolling upgrade tests and bump up logging (#42377)" This reverts commit 697c793dcbabf1df0351d75a3705047ac4435dca. Log leader and handshake failures by default (#42342) Today the `LeaderChecker` and `HandshakingTransportAddressConnector` do not log anything above `DEBUG` level. However there are some situations where it is appropriate for them to log at a higher level: - if the low-level handshake succeeds but the high-level one fails then this indicates a config error that the user should resolve, and the exception will help them to do so. - if leader checks fail repeatedly then we restart discovery, and the exception will help to determine what went wrong. Resolves #42153 Deprecate CommonTermsQuery and cutoff_frequency (#42619) * Deprecate CommonTermsQuery and cutoff_frequency Since the max_score optimization landed in Elasticsearch 7, the CommonTermsQuery is redundant and slower. Moreover the cutoff_frequency parameter for MatchQuery and MultiMatchQuery is redundant. Relates to #27096 Fix Class Load Order in Netty4Plugin (#42591) * Don't force the logger in the Netty4Plugin class already, at this point log4j might not be fully initialized. * The call was redundant anyway since we do the same thing in the Netty4Transport and Netty4HttpServerTransport classes already and there we do it properly after setting up log4j by initilizing the loggers * Relates #42532 [DOCS] Rewrite 'wildcard' query (#42670) [DOCS] path_hierarchy tokenizer examples (#39630) Closes #17138 Fix error with mapping in docs Fix refresh remote JWKS logic (#42662) This change ensures that: - We only attempt to refresh the remote JWKS when there is a signature related error only ( BadJWSException instead of the geric BadJOSEException ) - We do call OpenIDConnectAuthenticator#getUserClaims upon successful refresh. - We test this in OpenIdConnectAuthenticatorTests. Without this fix, when using the OpenID Connect realm with a remote JWKSet configured in `op.jwks_path`, the refresh would be triggered for most configuration errors ( i.e. wrong value for `op.issuer` ) and the kibana wouldn't get a response and timeout since `getUserClaims` wouldn't be called because `ReloadableJWKSource#reloadAsync` wouldn't call `onResponse` on the future. [ML] [Data Frame] add support for weighted_avg agg (#42646) Remove unused Gradle plugin (#42684) Remove usage of deprecated compare gradle builds plugin (#42687) * Remove usage of deprecated compare gradle builds plugin * Remove system property only used by build comparison Prevent merging nodes' data paths (#42665) Today Elasticsearch does not prevent you from reconfiguring a node's `path.data` to point to data paths that previously belonged to more than one node. There's no good reason to be able to do this, and the consequences can be quietly disastrous. Furthermore, #42489 might result in a user trying to split up a previously-shared collection of data paths by hand and there's definitely scope for mixing the paths up across nodes when doing this. This change adds a check during startup to ensure that each data path belongs to the same node. Clarify the settings around limiting nested mappings. (#42686) * Previously, we mentioned multiple times that each nested object was indexed as its own document. This is repetitive, and is also a bit confusing in the context of `index.mapping.nested_fields.limit`, as that applies to the number of distinct `nested` types in the mappings, not the number of nested objects. We now just describe the issue once at the beginning of the section, to illustrate why `nested` types can be expensive. * Reference the ongoing example to clarify the meaning of the two settings. Addresses #28363. Make hashed token ids url safe (#42651) This commit changes the way token ids are hashed so that the output is url safe without requiring encoding. This follows the pattern that we use for document ids that are autogenerated, see UUIDs and the associated classes for additional details. [DOCS] Disable Metricbeat system module (#42601) Remove SecurityClient from x-pack (#42471) This commit removes the SecurityClient class from x-pack. This client class is a relic of the transport client, which is in the process of being removed. Some tests were changed to use the high level rest client and others use a client directly without the security client wrapping it. Remove Log4j 1.2 API as a dependency (#42702) We had this as a dependency for legacy dependencies that still needed the Log4j 1.2 API. This appears to no longer be necessary, so this commit removes this artifact as a dependency. To remove this dependency, we had to fix a few places where we were accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since both APIs were on the compile-time classpath). Finally, we can remove our custom Netty logger factory. This was needed when we were on Log4j 1.2 and handled logging in our own unique way. When we migrated to Log4j 2 we could have dropped this dependency. However, even then Netty would still pick up Log4j 1.2 since it was on the classpath, thus the advantage to removing this as a dependency now. Remove client jar support from build (#42640) The client jars were a way for modules and plugins to produce an additional jar that contained classes for use by the transport client. This commit removes that configuration as the transport client is being removed. relates #42638 mute failing search template test (#42730) tracking issue #42664. Remove groovy client docs (#42731) The groovy client api was a wrapper around the transport client. However, it has not been published since 2.4, as it had many issues with the java security manager. This commit removes the docs from master for the groovy client. relates #42638 Fix docs typo in the certutil CSR mode (#42593) Changes the mention of `cert` to `csr`. Co-Authored-By: Alex Pang <[email protected]> Remove transport client docs (#42483) This commit removes the transport client documentation. remove v6.5.x and v6.6.x version constants (#42130) related to refactoring initiative #41164. Log the status of security on license change (#42488) Whether security is enabled/disabled is dependent on the combination of the node settings and the cluster license. This commit adds a license state listener that logs when the license change causes security to switch state (or to be initialised). This is primarily useful for diagnosing cluster formation issues. Remove leftover transport module docs (#42734) This commit removes docs for alternate transport implementations which were removed years ago. These were missed because they have redirects masking their existsence. Add option to ObjectParser to consume unknown fields (#42491) ObjectParser has two ways of dealing with unknown fields: ignore them entirely, or throw an error. Sometimes it can be useful instead to gather up these unknown fields and record them separately, for example as arbitrary entries in a map. This commit adds the ability to specify an unknown field consumer on an ObjectParser, called with the field name and parsed value of each unknown field encountered during parsing. The public API of ObjectParser is largely unchanged, with a single new constructor method and interface definition. Return NO_INTERVALS rather than null from empty TokenStream (#42750) IntervalBuilder#analyzeText will currently return null if it is passed an empty TokenStream, which can lead to a confusing NullPointerException later on during querying. This commit changes the code to return NO_INTERVALS instead. Fixes #42587 [ML] [Data Frame] nesting group_by fields like other aggs (#42718) [ML Data Frame] Refactor stop logic (#42644) * Revert "invalid test" This reverts commit 9dd8b52c13c716918ff97e6527aaf43aefc4695d. * Testing * mend * Revert "[ML Data Frame] Mute Data Frame tests" This reverts commit 5d837fa312b0e41a77a65462667a2d92d1114567. * Call onStop and onAbort outside atomic update * Don’t update CS * Tidying up * Remove invalid test that asserted logic that has been removed * Add stopped event * Revert "Add stopped event" This reverts commit 02ba992f4818bebd838e1c7678bd2e1cc090bfab. * Adding check for STOPPED in saveState Re-enable token bwc tests (#42726) This commit re-enables token bwc tests that run as part of the rolling upgrade tests. These tests were muted while #42651 was being backported. [ML] Add Kibana application privilege to data frame admin/user roles (#42757) Data frame transforms are restricted by different roles to ML, but share the ML UI. To prevent the ML UI being hidden for users who only have the data frame admin or user role, it is necessary to add the ML Kibana application privilege to the backend data frame roles. [DOCS] Remove unneeded `ifdef::asciidoctor[]` conditionals (#42758) Several `ifdef::asciidoctor` conditionals were added so that AsciiDoc and Asciidoctor doc builds rendered consistently. With https://github.com/elastic/docs/pull/827, Elasticsearch Reference documentation migrated completely to Asciidoctor. We no longer need to support AsciiDoc so we can remove these conditionals. Resolves #41722 Remove CommonTermsQuery and cutoff_frequency param (#42654) Remove `common` query and `cutoff_frequency` parameter of `match` and `multi_match` queries. Both have already been deprecated for the next 7.x version. Closes: #37096 Clarify that inner_hits must be used to access nested fields. (#42724) This PR updates the docs for `docvalue_fields` and `stored_fields` to clarify that nested fields must be accessed through `inner_hits`. It also tweaks the nested fields documentation to make this point more visible. Addresses #23766. Remove locale-dependent string checking We were checking if an exception was caused by a specific reason "Not a directory". Alas, this reason is locale-dependent and can fail on systems that are not set to en_US.UTF-8. This commit addresses this by deriving what the locale-dependent error message would be and using that for comparison with the actual exception thrown. Closes #41689 [DOCS] Remove unneeded options from `[source,sql]` code blocks (#42759) In AsciiDoc, `subs="attributes,callouts,macros"` options were required to render `include-tagged::` in a code block. With elastic/docs#827, Elasticsearch Reference documentation migrated from AsciiDoc to Asciidoctor. In Asciidoctor, the `subs="attributes,callouts,macros"` options are no longer needed to render `include-tagged::` in a code block. This commit removes those unneeded options. Resolves #41589 address SmokeTestWatcherWithSecurityIT#testSearchInputWithInsufficientPrivileges (#42764) This commit adds busy wait and increases the interval for SmokeTestWatcherWithSecurityIT#testSearchInputWithInsufficientPrivileges. Watcher will not allow the same watch to be executed concurrently. If it finds that case, it will update the watch history with a "not_executed_already_queued" status. Given a slow machine, and 1 second interval this is possible. To address this, this commit increases the interval so the watch can fire at most 2 times with a greater interval between the executions and adds a busy wait for the expected state. While this does not gaurntee a fix, it should greatly reduce the chances of this test erroring. Remove XPackClient from x-pack (#42729) This commit removes the XPackClient class from x-pack. This class is a relic of the TransportClient and simply a wrapper around it. Calls are replaced with direct usage of a client. Additionally, the XPackRestHandler class has been removed as it only served to provide the XPackClient to implementing rest handlers. Remove MonitoringClient from x-pack (#42770) This commit removes the monitoring client from x-pack. This class is a relic of the TransportClient and was only used in a test. Use an anonymous inner class instead of lambda for UP-TO-DATE support remove v6.8.x version constant and the backcompat code that uses it (#42146) Remove Support for VERSION_CHECKPOINTS Translogs (#42782) * Closes #42699 Remove some leftover refs to minimum_master_nodes (#42700) Today `InternalTestCluster` has a few vestigial mentions of the `minimum_master_nodes` setting. This commit removes them and simplifies some of the surrounding logic. Create client-only AnalyzeRequest/AnalyzeResponse classes (#42197) This commit clones the existing AnalyzeRequest/AnalyzeResponse classes to the high-level rest client, and adjusts request converters to use these new classes. This is a prerequisite to removing the Streamable interface from the internal server version of these classes. [ML] Better detection of binary input in find_file_structure (#42707) This change helps to prevent the situation where a binary file uploaded to the find_file_structure endpoint is detected as being text in the UTF-16 character set, and then causes a large amount of CPU to be spent analysing the bogus text structure. The approach is to check the distribution of zero bytes between odd and even file positions, on the grounds that UTF-16BE or UTF16-LE would have a very skewed distribution. [Docs] Add example to reimplement stempel analyzer (#42676) Adding an example of how to re-implement the polish stempel analyzer in case a user want to modify or extend it. In order for the analyzer to be able to use polish stopwords, also registering a polish_stop filter for the stempel plugin. Closes #13150 Clarify heap setting in Docker docs (#42754) Add note in the Docker docs that even when container memory is limited, we still require specifying -Xms/-Xmx using one of the supported methods. [ML] Add a limit on line merging in find_file_structure (#42501) When analysing a semi-structured text file the find_file_structure endpoint merges lines to form multi-line messages using the assumption that the first line in each message contains the timestamp. However, if the timestamp is misdetected then this can lead to excessive numbers of lines being merged to form massive messages. This commit adds a line_merge_size_limit setting (default 10000 characters) that halts the analysis if a message bigger than this is created. This prevents significant CPU time being spent subsequently trying to determine the internal structure of the huge bogus messages. [DOCS] Adds redirect for deprecated `common` terms query (#42767) Make Connection Future Err. Handling more Resilient (#42781) * There were a number of possible (runtime-) exceptions that could be raised in the adjusted code and prevent resolving the listener * Relates #42350 Read the default pipeline for bulk upsert through an alias (#41963) This commit allows bulk upserts to correctly read the default pipeline for the concrete index that belongs to an alias. Bulk upserts are modeled differently from normal index requests such that the index request is a request inside of the update request. The update request (outer) contains the index or alias name is not part of the (inner) index request. This commit adds a secondary check against the update request (outer) if the index request (inner) does not find an alias. RollupStart endpoint should return OK if job already started (#41502) If a job is started or indexing, RollupStart should always return a success (200 OK) response since the job is, in fact, started SQL: [Docs] Fix links syntax (#42806) Fix a couple of wrong links because of the order of the anchor and the usage of backquotes. More improvements to cluster coordination docs (#42799) This commit addresses a few more frequently-asked questions: * clarifies that bootstrapping doesn't happen even after a full cluster restart. * removes the example that uses IP addresses, to try and further encourage the use of node names for bootstrapping. * clarifies that auto-bootstrapping might form different clusters on different hosts, and gives a process for starting again if this wasn't what you wanted. * adds the "do not stop half-or-more of the master-eligible nodes" slogan that was notably absent. * reformats one of the console examples to a narrower width Remove "template" field in IndexTemplateMetaData (#42099) Remove "template" field from XContent parsing in IndexTemplateMetaData Fix error with test conventions on tasks that require Docker (#42719) [ML] [Data Frame] adding and modifying auditor messages (#42722) * [ML] [Data Frame] adding and modifying auditor messages * Update DataFrameTransformTask.java Make high level rest client a fat jar (#42771) The original intention of the high level rest client was to provide a single jar. We tried this long ago, but had issues with intellij not correctly resolving internal tests that relied on the HLRC. This commit tweaks our use of the shadow plugin so we now produce a correct fat jar (minus the LLRC and server jars, which we can address later), with the module "client" dependencies included, as well as the correct pom file omitting those dependencies. relates #42638 Add Basic Date Docs to Painless (#42544) [Docs] Add note for date patterns used for index search. (#42810) Add an explanatory NOTE section to draw attention to the difference between small and capital letters used for the index date patterns. e.g.: HH vs hh, MM vs mm. Closes: #22322 [Docs] Fix reference to `boost` and `slop` params (#42803) For `multi_match` query: link `boost` param to the generic reference for query usage and `slop` to the `match_phrase` query where its usage is documented. Fixes: #40091 Remove unnecessary usage of Gradle dependency substitution rules (#42773) Don't require TLS for single node clusters (#42826) This commit removes the TLS cluster join validator. This validator existed to prevent v6.x nodes (which mandated TLS) from joining an existing cluster of v5.x nodes (which did not mandate TLS) unless the 6.x node (and by implication the 5.x nodes) was configured to use TLS. Since 7.x nodes cannot talk to 5.x nodes, this validator is no longer needed. Removing the validator solves a problem where single node clusters that were bound to local interfaces were incorrectly requiring TLS when they recovered cluster state and joined their own cluster. OIDC Guide additions (#42555) - Call out the fact that the SSL Configuration is important and offer a minimal example of configuring a custom CA for trust. - Add information about the `op.issuer` that was missing and add information about the `rp.post_logout_redirect` in the example since `op.endsession_endpoint` was already mentioned there and these two should be together - Explain that `op.jwkset_path` can be a URL. [ML] [Data Frame] Adding supported aggs in docs (#42728) * [ML] [Data Frame] Adding supported aggs in docs * [DOCS] Moves pivot to definitions list [ML][Data Frame] forcing that no ptask => STOPPED state (#42800) * [ML][Data Frame] forcing that no ptask => STOPPED state * Addressing side-effect, early exit for stop when stopped [Docs] Add to preference parameter docs (#42797) Adding notes to the existing docs about how using `preference` might increase request cache utilization but also add warning about the downsides. Closes #24278 [DOCS] Fix broken bucket script agg link Refactor control flow in TransportAnalyzeAction (#42801) The control flow in TransportAnalyzeAction is currently spread across two large methods, and is quite difficult to follow. This commit tidies things up a bit, to make it clearer when we use pre-defined analyzers and when we use custom built ones. [DOCS] Fix typo in bucket script aggregation link Fix testNoMasterActionsWriteMasterBlock (#42798) This commit performs the proper restore of network disruption. Previously disruptionScheme.stopDisrupting() was called that does not ensure that connectivity between cluster nodes is restored. The test was checking that the cluster has green status, but it was not checking that connectivity between nodes is restored. Here we switch to internalCluster().clearDisruptionScheme(true) which performs both checks before returning. Closes #39688 Change shard allocation filter property and api (#42602) The current example is not working and a bit confused. This change tries to match it with the sample of the watcher blog. NullPointerException when creating a watch with Jira action (#41922) (#42081) NullPointerException when secured_url does not use proper scheme in jira action. This commit will handle Expection and display proper message. Eclipse libs projects setup fix (#42852) Fallout from #42773 for eclipse users. Replicate aliases in cross-cluster replication (#41815) This commit adds functionality so that aliases that are manipulated on leader indices are replicated by the shard follow tasks to the follower indices. Note that we ignore write indices. This is due to the fact that follower indices do not receive direct writes so the concept is not useful. Fix version parsing in various tests (#42871) This commit fixes the version parsing in various tests. The issue here is that the parsing was relying on java.version. However, java.version can contain additional characters such as -ea for early access builds. See JEP 233: Name Syntax ------------------------------ -------------- java.version $VNUM(\-$PRE)? java.runtime.version $VSTR java.vm.version $VSTR java.specification.version $VNUM java.vm.specification.version $VNUM Instead, we want java.specification.version. Adjust BWC version on aliases replication This commit adjusts the BWC version on aliases replication after the change has been backported to 7.x (currently versioned as 7.3.0). Enable testing against JDK 13 EA builds (#40829) This commit adds JDK 13 to the CI rotation for testing. For now, we will be testing against JDK 13 EA builds. Avoid clobbering shared testcluster JAR files when installing modules (#42879) Permit API Keys on Basic License (#42787) Kibana alerting is going to be built using API Keys, and should be permitted on a basic license. This commit moves API Keys (but not Tokens) to the Basic license Relates: kibana#36836 Deduplicate alias and concrete fields in query field expansion (#42328) The full-text query parsers accept field pattern that are expanded using the mapping. Alias field are also detected during the expansion but they are not deduplicated with the concrete fields that are found from other patterns (or the same). This change ensures that we deduplicate the target fields of the full-text query parsers in order to avoid adding the same clause multiple times. Boolean queries are already able to deduplicate clauses during rewrite but since we also use DisjunctionMaxQuery it is preferable to detect these duplicates early on. Enable Parallel Deletes in Azure Repository (#42783) * Parallel deletes via private thread pool More logging in testRerouteOccursOnDiskPassingHighWatermark (#42864) This test is failing because recoveries of these empty shards are not completing in a reasonable time, but the reason for this is still obscure. This commit adds yet more logging. Relates #40174, #42424 Removes type from TermVectors APIs (#42198) Use reader attributes to control term dict memory useage (#42838) This change makes use of the reader attributes added in LUCENE-8671 to ensure that `_id` fields are always on-heap for best update performance and term dicts are generally off-heap on Read-Only engines. Closes #38390 Fix Stuck IO Thread Logging Time Precision (#42882) * The precision of the timestamps we get from the cached time thread is only 200ms by default resulting in a number of needless ~200ms slow network thread execution logs * Fixed by making the warn threshold a function of the precision of the cached time thread found in the settings Enable console audit logs for docker (#42671) Enable audit logs in docker by creating console appenders for audit loggers. also rename field @timestamp to timestamp and add field `type` with value audit The docker build contains now two log4j configuration for oss or default versions. The build now allows override the default configuration. Also changed the format of a timestamp from ISO8601 to include time zone as per this discussion https://github.com/elastic/elasticsearch/pull/36833#discussion_r244225243 closes #42666 [ML] Change dots in CSV column names to underscores (#42839) Dots in the column names cause an error in the ingest pipeline, as dots are special characters in ingest pipeline. This PR changes dots into underscores in CSV field names suggested by the ML find_file_structure endpoint _unless_ the field names are specifically overridden. The reason for allowing them in overrides is that fields that are not mentioned in the ingest pipeline can contain dots. But it's more consistent that the default behaviour is to replace them all. Fixes elastic/kibana#26800 Disable building on JDK 13 in CI This commit disables building on JDK 13 in CI. The reason for this is because Gradle is not yet ready to run on JDK 13. We could re-introduce infrastructure to enable Gralde to run on a different JDK than the build JDK, but rather than introducing such complexity we will instead wait for Gradle to be ready to run on JDK 13. Add Ability to List Child Containers to BlobContainer (#42653) * Add Ability to List Child Containers to BlobContainer * This is a prerequisite of #42189 Fix Azure Plugin Compilation Issue Fix Infinite Loops in ExceptionsHelper#unwrap (#42716) * Fix Infinite Loops in ExceptionsHelper#unwrap * Keep track of all seen exceptions and break out on loops * Closes #42340 Add custom metadata to snapshots (#41281) Adds a metadata field to snapshots which can be used to store arbitrary key-value information. This may be useful for attaching a description of why a snapshot was taken, tagging snapshots to make categorization easier, or identifying the source of automatically-created snapshots. Omit JDK sources archive from bundled JDK (#42821) Clean Up Painless Datetime Docs (#42869) This change abstracts the specific types away from the different representations of datetime as a datetime representation in code can be all kinds of different things. This defines the three most common types of datetimes as numeric, string, and complex while outlining the type most typically used for these as long, String, and ZonedDateTime, respectively. Documentation uses the definitions while examples use the types. This makes the documentation easier to consume especially for people from a non-Java background. Optimize Snapshot Finalization (#42723) * Optimize Snapshot Finalization * Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented * Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs Make sibling pipeline agg ctor's protected (#42808) SiblingPipelineAggregator is a public interfaces, but the ctor was package-private. These should be protected so that plugin authors can extend and implement their own sibling pipeline agg. [DOCS] Adds discovery.type (#42823) Co-Authored-By: David Turner <[email protected]> [Docs] Clarify caveats for phonetic filters replace option (#42807) The `replace` option in the phonetic token filter can have suprising side effects, e.g. such as described in #26921. This PR adds a note to be mindful about such scenarios and offers alternatives to using the `replace` option. Closes #26921 Skip installation of pre-bundled integ-test modules (#42900) Mute failing test Remove alpha/beta/rc from version constants (#42778) Prerelease qualifiers were moved outside of Version logic within Elasticsearch for 7.0.0, where they are now just an external modifier on the filename. However, they still existed inside code to support 6.x constants. Now that those constants have been removed in master, the prerelease logic can be removed. Skip shadow jar logic for javadoc and sources jars (#42904) For shadow jars, we place the original jar in a build/libs directory. This is to avoid clobbering the original jar when building the shadow jar. However, we need to skip this logic for javadoc and sources jars otherwise they would never be copied to the build/distributions directory during assembly. Use jar task name constant in BuildPlugin Rather than comparing to a raw string, this commit uses a built-in constant to refer to the jar task name. Relates #42904 Remove the transport client (#42538) This commit removes the transport client and all remaining uses in the code. Correct versions limits for snapshot metadata field (#42911) Now that the snapshot metadata field has been backported, the version restrictions used in tests and for serialization need to corrected. [ML-DataFrame] increase the scheduler interval to 10s (#42845) increases the scheduler interval to fire less frequently, namely changing it from 1s to 10s. The scheduler interval is used for retrying after an error condition. [ML-DataFrame] reduce log spam: do not trigger indexer if state is indexing or stopping (#42849) reduce log spam: do not trigger indexer if state is indexing or stopping [ML] Add earliest and latest timestamps to field stats (#42890) This change adds the earliest and latest timestamps into the field stats for fields of type "date" in the output of the ML find_file_structure endpoint. This will enable the cards for date fields in the file data visualizer in the UI to be made to look more similar to the cards for date fields in the index data visualizer in the UI. [ML] Close sample stream in find_file_structure endpoint (#42896) A static code analysis revealed that we are not closing the input stream in the find_file_structure endpoint. This actually makes no difference in practice, as the particular InputStream implementation in this case is org.elasticsearch.common.bytes.BytesReferenceStreamInput and its close() method is a no-op. However, it is good practice to close the stream anyway. Mute testEnableDisableBehaviour (#42929) [ML] [Data Frame] Adding pending task wait to the hlrc cleanup (#42907) Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741) * Add a merge policy that prunes soft-deleted postings This change adds a merge policy that drops all postings for documents that are marked as deleted. This is usually unnecessary unless soft-deletes are used with a rentention policy since otherwise a merge would remove deleted documents anyway. Yet, this merge policy prevents extreme cases where a very large number of soft-deleted documents are retained and are impacting search and update perfromance. Note, using this merge policy will remove all search capabilities for soft-deleted documents. * fix checkstyle * fix assertion * fix imports * fix compilation * add predicate to select fields to prune * only purne ID field * beef up test * roll back retention query * foo * remove redundant modifier * fix assumption about empty Terms * remove null check * Add test for the engine to check if we prune the IDs of retained docs away Mute failing testPerformActionAttrsRequestFails (#42933) [ML][Data Frame] pull state and states for indexer from index (#42856) * [ML][Data Frame] pull state and states for indexer from index * Update DataFrameTransformTask.java Revert "Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741)" This reverts commit 186b52c5738688b72543d9353539468e719fafce github messed up the commit message due to a retry. A followup commit will add this change again with a corrected commit message. Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741) This change adds a merge policy that drops all _id postings for documents that are marked as soft-deleted but retained across merges. This is usually unnecessary unless soft-deletes are used with a retention policy since otherwise a merge would remove deleted documents anyway. Yet, this merge policy prevents extreme cases where a very large number of soft-deleted documents are retained and are impacting update performance. Note, using this merge policy will remove all lookup by ID capabilities for soft-deleted documents. configure auto expand for dataframe indexes (#42924) creates the dataframe destination index with auto expand for replicas (0-1) Fix NPE when rejecting bulk updates (#42923) Single updates use a different internal code path than updates that are wrapped in a bulk request. While working on a refactoring to bring both closer together I've noticed that bulk updates were failing some of the tests that single updates passed. In particular, bulk updates cause NullPointerExceptions to be thrown and listeners not being properly notified when being rejected from the thread pool. Fix testPendingTasks (#42922) Fixes a race in the test which can be reliably reproduced by adding Thread.sleep(100) to the end of IndicesService.processPendingDeletes Closes #18747 Fix `InternalEngineTests#testPruneAwayDeletedButRetainedIds` The test failed because we had only a single document in the index that got deleted such that some assertions that expected at least one live doc failed. Relates to: #40741 [TEST] Remove unnecessary log line [DOCS] Rewrite terms query (#42889) Reindex max_docs parameter name (#41894) Previously, a reindex request had two different size specifications in the body: * Outer level, determining the maximum documents to process * Inside the source element, determining the scroll/batch size. The outer level size has now been renamed to max_docs to avoid confusion and clarify its semantics, with backwards compatibility and deprecation warnings for using size. Similarly, the size parameter has been renamed to max_docs for update/delete-by-query to keep the 3 interfaces consistent. Finally, all 3 endpoints now support max_docs in both body and URL. Relates #24344 [DOCS] Move 'Scripting' section to top-level navigation. (#42939) shrink may full copy when using multi data paths (#42913) Additional scenario for full segment copy if hard link cannot work across disks. Fix concurrent search and index delete (#42621) Changed order of listener invocation so that we notify before registering search context and notify after unregistering same. This ensures that count up/down like what we do in ShardSearchStats works. Otherwise, we risk notifying onFreeScrollContext before notifying onNewScrollContext (same for onFreeContext/onNewContext, but we currently have no assertions failing in those). Closes #28053 Wire query cache into sorting nested-filter computation (#42906) Don't use Lucene's default query cache when filtering in sort. Closes #42813 Make PR template reference supported architectures (#42919) This commit changes the GitHub PR template to refer to supported "OS and architecture" (rather than use OS) since we only accept PRs for x86_64 (and not Linux ARM, s390, etc) Relax timeout in NodeConnectionsServiceTests (#42934) Today we assert that the connection thread is blocked by the time the test gets to the barrier, but in fact this is not a valid assertion. The following `Thread.sleep()` will cause the test to fail reasonably often. ```diff diff --git a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java index 193cde3180d..0e57211cec4 100644 --- a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java +++ b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java @@ -364,6 +364,7 @@ public class NodeConnectionsServiceTests extends ESTestCase { final CheckedRunnable<Exception> connectionBlock = nodeConnectionBlocks.get(node); if (connectionBlock != null) { try { + Thread.sleep(50); connectionBlock.run(); } catch (Exception e) { throw new AssertionError(e); ``` This change relaxes the test to allow some time for the connection thread to hit the barrier. Fixes #40170 Improve translog corruption detection (#42744) Today we test for translog corruption by incrementing a byte by 1 somewhere in a file, and verify that this leads to a `TranslogCorruptionException`. However, we rely on _all_ corruptions leading to this exception in the `RemoveCorruptedShardDataCommand`: this command fails if a translog file corruption leads to a different kind of exception, and `EOFException` and `NegativeArraySizeException` are both possible. This commit strengthens the translog corruption detection tests by simulating the following: - a bit is flipped - all bits are cleared or set - a random value is written - the file is truncated It also makes sure that we return a `TranslogCorruptionException` in all such cases. Fixes #42661 Fix FsRepositoryTests.testSnapshotAndRestore (#42925) * The commit generation can be 3 or 2 here -> fixed by checking the actual generation on the second commit instead of hard coding 2 * Closes #42905 Only ignore IOException when fsyncing on dirs (#42972) Today in the method IOUtils#fsync we ignore IOExceptions when fsyncing a directory. However, the catch block here is too broad, for example it would be ignoring IOExceptions when we try to open a non-existant file. This commit addresses that by scoping the ignored exceptions only to the invocation of FileChannel#force. Remove Comma in Example (#41873) The comma is there in error as there are no other parameter after 'value' [ML][Data frame] make sure that fields exist when creating progress (#42943) [TEST] Adding a BWC test for ML categorization config (#42981) This test coverage was previously missing. Remove WatcherClient from x-pack (#42815) This commit removes the WatcherClient and WatcherRestHandler from the codebase. The WatcherClient was a convenience wrapper around the transport client, which is being removed so the client no longer serves a purpose. The WatcherRestHandler is no longer needed as its primary purpose was to provide a WatcherClient to the implementing handlers. Remove the CcrClient (#42816) This commit removes the CcrClient class, which is a wrapper around the transport client. The transport client is being removed so the client is no longer needed. Remove the ILMClient (#42817) This commit removes the ILMClient class, which is a wrapper around the transport client. This class is not used in the codebase and the transport client is being removed. [DOCS] Add explicit `articles_case` parameter to Elision Token Filter example (#42987) Update default shard count per index in readme (#42388) The default shard count has been reduced from 5 to 1. This commit updates the readme to reflect that changed default. [ML][Data Frame] allow null values for aggs with sparse data (#42966) * [ML][Data Frame] allow null values for aggs with sparse data * Making classes static, memory allocation optimization Drop dead code for socket permissions for transport (#42990) This code has not been needed since the removal of tribe nodes, it was left behind when those were dropped (note that regular transport permissions are handled through transport profiles, even if they are not explicitly in use). Fix possible NPE in put mapping validators (#43000) When applying put mapping validators, we apply all the validators in the collection. If a failure occurs, we collect that as a top-level exception, and suppress any additional failures into the top-level exception. However, if a request passes the validator after a top-level exception has been collected, we would try to suppress a null exception into the top-level exception. This is a violation of the Throwable#addSuppressed API. This commit addresses this, and adds test to cover the logic of collecting the failures when validating a put mapping request. Fix put mapping request validators random test This commit fixes a test bug in the request validators random test. In particular, an assertion was not properly nested in a guard that would ensure that was at least one failure. Relates #43000 Fix IOUtils#fsync on Windows fsyncing directories (#43008) Fsyncing directories on Windows is not possible. We always suppressed this by allowing that an AccessDeniedException is thrown when attemping to open the directory for reading. Yet, this suppression also allowed other IOExceptions to be suppressed, and that was a bug (e.g., the directory not existing, or a filesystem error and reasons that we might get an access denied there, like genuine permissions issues). This leniency was previously removed yet it exposed that we were suppressing this case on Windows. Rather than relying on exceptions for flow control and continuing to suppress there, we simply return early if attempting to fsync a directory on Windows (we will not put this burden on the caller). Mute testLookupSeqNoByIdInLucene Tracked at #42979 Mute AutodetectMemoryLimitIT#testTooManyPartitions Relates #43013 Fix assertion in ReadOnlyEngine (#43010) We should execute the assertion before throwing an exception; otherwise, it's a noop. Unmuted testRecoverBrokenIndexMetadata These tests should be okay as we flush at the end of peer recovery. Closes #40867 Refactor put mapping request validation for reuse (#43005) This commit refactors put mapping request validation for reuse. The concrete case that we are after here is the ability to apply effectively the same framework to indices aliases requests. This commit refactors the put mapping request validation framework to allow for that. Do not allow modify aliases on followers (#43017) Now that aliases are replicated by a follower from its leader, this commit prevents directly modifying aliases on follower indices. Adjust IndicesAliasesRequest origin BWC version The work to add the origin field to the IndicesAliasesRequest has been backported to 7.x. Since this version is currently 7.3.0, this commit adjusts the version in master accordingly. Add note to CCR docs regarding alias replication This commit adds a note to the docs regarding the automatic replication of aliases by a follower index from its leader index. Add note to CCR docs about mapping/alias updates This commit adds a note to the docs clarifying that it is not possible to manually update the mapping nor the aliases of a follower index. Unmute PermissionsIT test and enable debug logging for it (#42876) This unmutes `testWhenUserLimitedByOnlyAliasOfIndexCanWriteToIndexWhichWasRolledoverByILMPolicy` and enables DEBUG logging. The failure from this test case from a query running rather than ILM itself, so more information is needed. Relates to #41440 Since SQL is GA, remove the sql language plugin from this list (#41533) SQL: cover the Integer type when extracting values from _source (#42859) * Take into consideration a wider range of Numbers when extracting the values from source, more specifically - BigInteger and BigDecimal. Allow routing commands with ?retry_failed=true (#42658) We respect allocation deciders, including the `MaxRetryAllocationDecider`, when executing reroute commands. If you specify `?retry_failed=true` then the retry counter is reset, but today this does not happen until after trying to execute the reroute commands. This means that if an allocation has repeatedly failed, but you want to take control and assign a shard to a particular node to work around the repeated failures, you cannot execute the routing command in the same call to `POST /_cluster/reroute` as the one that resets the failure counter. This commit fixes this by resetting the failure counter first, meaning that you can now explicitly allocate a repeatedly-failed shard like this: ``` POST /_cluster/reroute?retry_failed=true { "commands": [ { "allocate_replica": { "index": "blahblah", "shard": 2, "node": "node-4" } } ] } ``` Fixes #39546 Fix auto fuzziness in query_string query (#42897) Setting `auto` after the fuzzy operator (e.g. `"query": "foo~auto"`) in the `query_string` does not take the length of the term into account when computing the distance and always use a max distance of 1. This change fixes this disrepancy by ensuring that the term is passed when the fuzziness is computed. Don't run build-tools integ tests on FIPS (#42986) These run Gradle and FIPS isn't supported Closes #41721 Fix typo in create-index.asciidoc (#41806) Update regexp-syntax.asciidoc (#43021) Corrects a typo. Update search-settings.asciidoc (#43016) Grammar and spelling fixes [ML] Re-enable integration test (#41712) Move construction of custom analyzers into AnalysisRegistry (#42940) Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build custom analyzers for index-independent analysis. A lot of this code is duplicated, and it requires the AnalysisRegistry to expose a number of internal provider classes, as well as making some assumptions about when analysis components are constructed. This commit moves the build logic directly into AnalysisRegistry, reducing the registry's API surface considerably. Improve documentation for smart_cn analyzer (#42822) Correct the description of generate_word_parts (#43026) Clean up configuration when docker isn't available (#42745) We initially added `requireDocker` for a way for tasks to say that they absolutely must have it, like the build docker image tasks. Projects using the test fixtures plugin are not in this both, as the intent with these is that they will be skipped if docker and docker-compose is not available. Before this change we were lenient, the docker image build would succeed but produce nothing. The implementation was also confusing as it was not immediately obvious this was the case due to all the indirection in the code. The reason we have this leniency is that when we added the docker image build, docker was a fairly new requirement for us, and we didn't have it deployed in CI widely enough nor had CI configured to prefer workers with docker when possible. We are in a much better position now. The other reason was other stack teams running `./gradlew assemble` in their respective CI and the possibility of breaking them if docker is not installed. We have been advocating for building specific distros for some time now and I will also send out an additional notice The PR also removes the use of `requireDocker` from tests that actually use test fixtures and are ok without it, and fixes a bug in test fixtures that would cause incorrect configuration and allow some tasks to run when docker was not available and they shouldn't have. Closes #42680 and #42829 see also #42719 Better Exception in NetworkUtilsTests (#42109) * We are still running into an exception here every so often * Adjusted exception to contain interface name * Relates to #41549 Fix GCS Blob Repository 3rd Party Tests (#43030) * We have to strip the trailing slash from child names here like we do for AWS * closes #43029 [DOCS] Change `// TESTRESPONSE[_cat]` to `// TESTRESPONSE[non_json]` (#43006) [ML] Get resources action should be lenient when sort field is unmapped (#42991) Get resources action sorts on the resource id. When there are no resources at all, then it is possible the index does not contain a mapping for the resource id field. In that case, the search api fails by default. This commit adjusts the search request to ignore unmapped fields. Closes elastic/kibana#37870 Mute AzureDiscoveryClusterFormationTests (#43049) Relates #43048 Fix IpFilteringIntegrationTests (#43019) * Increase timeout to 5s since we saw 500ms+ GC pauses on CI * closes #40689 Increase waiting time when check retention locks (#42994) WriteActionsTests#testBulk and WriteActionsTests#testIndex sometimes fail with a pending retention lock. We might leak retention locks when switching to async recovery. However, it's more likely that ongoing recoveries prevent the retention lock from releasing. This change increases the waiting time when we check for no pending retention lock and also ensures no ongoing recovery in WriteActionsTests. Closes #41054 [ML][Data Frame] Removes slice specification from DBQ. See #42996 (#43036) Rename processor test fix (#43035) If the source field name is a prefix of the target field name, the source field still exists after rename processor has run. Adjusted test case to handle that case. Default distro run creates elastic-admin user (#43004) When using gradle run by itself, this uses the default distro with a basic license and enables security. There is a setup command to create a elastic-admin user but only when the license is a trial license. Now that security is available with the basic license, we should always run this command when using the default distribution. Fixing handling of auto slices in bulk scroll requests (#43050) * Fixing handling of auto slices in bulk scroll requests * adjusting assertions for tests Unmute IndexFollowingIT#testFollowIndex Fixed in #41987 Fix NPE in CcrRetentionLeaseIT (#43059) The retention leases stats is null if the processing shard copy is being closed. In this the case, we should check against null then retry to avoid failing a test. Closes #41237 [ML] Changes slice specification to auto. See #42996 (#43039) [ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds (#42969) * [ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds * only supporting doc_values for geo_point fields * moving validation into GeoPointField ctor Upgrade AWS SDK to Latest Version (#42708) * Just staying up to data on the SDK version * Use `AbstractAmazonEC2` to shorten code Better test diag output on OOM (#42989) If linearizability checking fails with OOM (or other exception), we did not get the serialized history written into the log, making it difficult to debug in cases where the problem is hard to reproduce. Fixed to always attempt dumping the serialized history. Related to #42244 Refresh remote JWKs on all errors (#42850) It turns out that key rotation on the OP, can manifest as both a BadJWSException and a BadJOSEException in nimbus-jose-jwt. As such we cannot depend on matching only BadJWSExceptions to determine if we should poll the remote JWKs for an update. This has the side-effect that a remote JWKs source will be polled exactly one additional time too for errors that have to do with configuration, or for errors that might be caused by not synched clocks, forged JWTs, etc. ( These will throw a BadJWTException which extends BadJOSEException also ) Split search in two when made against throttled and non throttled searches (#42510) When a search on some indices takes a long time, it may cause problems to other indices that are being searched as part of the same search request and being written to as well, because their search context needs to stay open for a long time. This is especially a problem when searching against throttled and non-throttled indices as part of the same request. The problem can be generalized though: this may happen whenever read-only indices are searched together with indices that are being written to. Search contexts staying open for a long time is only an issue for indices that are being written to, in practice. This commit splits the search in two sub-searches: one for read-only indices, and one for ordinary indices. This way the two don't interfere with each other. The split is done only when size is greater than 0, no scroll is provided and query_then_fetch is used as search type. Otherwise, the search executes like before. Note that the returned num_reduce_phases reflect the number of reduction phases that were run. If the search is split in two, there are three reductions: one non-final for each search, and a final one that merges the results of the previous two. Closes #40900 [DOCS] Clarify phrase suggester docs smoothing parameter (#42947) Closes #28512 remove path from rest-api-spec (#41452) SQL: Clarify that the connections the jdbc driver creates are not pooled (#42992) Restructure the SQL Language section to have proper sub-sections (#43007) Rest docs page update - have the section be on separate pages - add an Overview page - add other formats examples Increase test logging for testSyncedFlushSkipOutOfSyncReplicas Relates to #43086 Rename TESTRESPONSE[_cat] to TESTRESPONSE[non_json] (#43087) Documents the new deprecations op…
Changed order of listener invocation so that we notify before
registering search context and notify after unregistering same.
This ensures that count up/down like what we do in ShardSearchStats
works. Otherwise, we risk notifying onFreeScrollContext before notifying
onNewScrollContext (same for onFreeContext/onNewContext, but we
currently have no assertions failing in those).
Closes #28053